Libraries
knitr::opts_chunk$set(echo = TRUE, message = F, warning = F, fig.align = 'center')
suppressPackageStartupMessages(library(tidyverse)) # Data manipulation
library(googlesheets4) # Reading in google sheets
suppressPackageStartupMessages(library(janitor)) # For cleaning
library(naniar) # For visualizing missing data
suppressPackageStartupMessages(library(lares)) # For visualizing missing data
library(readr) # For writing and reading data
library(here) # For good file structure and replication## here() starts at /Users/dunk/Projects/TeachingLab
library(gt) # Nice tables
htmltools::tagList(rmarkdown::html_dependency_font_awesome()) # Needed so fa's in footer will showReading in the Data
The first thing I did was make a copy of the data because it is impossible to read in sheets that are saved as an xls, I could have also just saved it as an xls and then read it into R. If you try to replicate this it will request an OAuth token for use of a google account with googlesheets4 so make sure to run in R before knitting.
teaching_lab_df <- read_sheet("https://docs.google.com/spreadsheets/d/10R_NGttRICpz5uIMjIfcNmYDC4cOkzm1Esl_eDnOKeo/edit#gid=0",
col_types = "Dcccccccccccccccccccdcc") # col_types to specify as D = date, c = character, d = double
# Could easily be done with readxl::read_excel as wellInitial Automated Cleaning
I use janitor to change the names to space based on underscores so I don’t have to type out full names.
teaching_df <- teaching_lab_df %>%
rename_with( ~ gsub("\\.\\.\\*$", "", .)) %>% # use regex and dplyr to clean up column names, this one simply deletes everything with 2 or more dots in the column names
clean_names() # automated column name cleaning with janitor
compare_df_cols(teaching_df, teaching_lab_df, bind_method = "bind_rows") %>%
gt::gt() %>%# check column name difference
tab_header(title = md("**Comparing Columns Before and After Janitor**")) %>%
tab_options(
table.background.color = "lightcyan"
)| Comparing Columns Before and After Janitor | ||
|---|---|---|
| column_name | teaching_df | teaching_lab_df |
| Date for the session | NA | Date |
| date_for_the_session | Date | NA |
| Did you have a second facilitator? | NA | character |
| did_you_have_a_second_facilitator | character | NA |
| Do you have additional comments? | NA | character |
| do_you_have_additional_comments | character | NA |
| How likely are you to apply this learning to your practice in the next 4-6 weeks? | NA | character |
| How likely are you to recommend this professional learning to a colleague or friend? | NA | numeric |
| how_likely_are_you_to_apply_this_learning_to_your_practice_in_the_next_4_6_weeks | character | NA |
| how_likely_are_you_to_recommend_this_professional_learning_to_a_colleague_or_friend | numeric | NA |
| I am satisfied with the overall quality of today's professional learning session. | NA | character |
| i_am_satisfied_with_the_overall_quality_of_todays_professional_learning_session | character | NA |
| overall_what_went_well_in_this_professional_learning | character | NA |
| Overall, what went well in this professional learning? | NA | character |
| Professional training session | NA | character |
| professional_training_session | character | NA |
| s_he_effectively_built_a_community_of_learners_13 | character | NA |
| s_he_effectively_built_a_community_of_learners_17 | character | NA |
| s_he_facilitated_the_content_clearly_12 | character | NA |
| s_he_facilitated_the_content_clearly_16 | character | NA |
| S/he effectively built a community of learners....13 | NA | character |
| S/he effectively built a community of learners....17 | NA | character |
| S/he facilitated the content clearly....12 | NA | character |
| S/he facilitated the content clearly....16 | NA | character |
| Select the best description for your role. | NA | character |
| Select the grade-band(s) you focused on. | NA | character |
| Select the name of your first facilitator. | NA | character |
| Select the name of your second facilitator. | NA | character |
| Select your site (district, parish, or network). | NA | character |
| select_the_best_description_for_your_role | character | NA |
| select_the_grade_band_s_you_focused_on | character | NA |
| select_the_name_of_your_first_facilitator | character | NA |
| select_the_name_of_your_second_facilitator | character | NA |
| select_your_site_district_parish_or_network | character | NA |
| The activities of today's session were well-designed to help me learn. | NA | character |
| the_activities_of_todays_session_were_well_designed_to_help_me_learn | character | NA |
| Today's topic was relevant for my role. | NA | character |
| todays_topic_was_relevant_for_my_role | character | NA |
| What could have improved your experience? | NA | character |
| What is the learning from this professional learning that you are most excited about trying out? | NA | character |
| what_could_have_improved_your_experience | character | NA |
| what_is_the_learning_from_this_professional_learning_that_you_are_most_excited_about_trying_out | character | NA |
| Which activities best supported your learning? | NA | character |
| which_activities_best_supported_your_learning | character | NA |
| Why did you choose this rating? | NA | character |
| why_did_you_choose_this_rating | character | NA |
Secondary Data Cleaning
Here I take a quick look at the data before secondary cleaning, and change some of the names of the ridiculously long columns. It is noteable that not a single row is entirely blank so everyone answered at least some of the questions.
cols <- c("professional_training_session",
"select_your_site_district_parish_or_network",
"select_the_best_description_for_your_role",
"select_the_grade_band_s_you_focused_on",
"learning_session_satisfaction",
"todays_topic_was_relevant_for_my_role",
"designed_to_help_me_learn",
"likely_to_apply_46_weeks",
"select_the_name_of_your_first_facilitator",
"s_he_facilitated_the_content_clearly_12",
"did_you_have_a_second_facilitator",
"select_the_name_of_your_second_facilitator",
"s_he_facilitated_the_content_clearly_16",
"s_he_effectively_built_a_community_of_learners_17") # Vector of columns to make factors
cols_agree <- c("learning_session_satisfaction",
"todays_topic_was_relevant_for_my_role",
"designed_to_help_me_learn",
"likely_to_apply_46_weeks",
"s_he_facilitated_the_content_clearly_12",
"s_he_facilitated_the_content_clearly_16",
"s_he_effectively_built_a_community_of_learners_17",
"s_he_effectively_built_a_community_of_learners_13") # Vector of columns to add levels
teaching_df <- teaching_df %>%
rename(learning_session_satisfaction = i_am_satisfied_with_the_overall_quality_of_todays_professional_learning_session,
designed_to_help_me_learn = the_activities_of_todays_session_were_well_designed_to_help_me_learn,
learning_to_try_out = what_is_the_learning_from_this_professional_learning_that_you_are_most_excited_about_trying_out,
likely_to_apply_46_weeks = how_likely_are_you_to_apply_this_learning_to_your_practice_in_the_next_4_6_weeks,
likely_to_recommend_colleague_friend = how_likely_are_you_to_recommend_this_professional_learning_to_a_colleague_or_friend) %>%
mutate_each_(funs(factor(.)),
vars = cols) %>%
mutate_each_(funs(factor(.,
levels = c("Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"))),
vars = cols_agree) # First rename some of the columns then make them factors and add levels, cols filters which variables to do this for Visualizing Missing Data
original_columns <- colnames(teaching_lab_df) # Just useful to have for later, not important if knitSaving Data
# write_rds(teaching_df, here("Data/final_df.rds"))
# write_rds(teaching_lab_df, here("Data/original_df.rds"))